The development of Tagged Uyghur Corpus

نویسندگان

Yusup Aibaidula

Kim-Teng Lua

چکیده

The history and development of Uyghur language is introduced. After a brief introduction to the development of Uyghur words, morphology and syntax, we explain our developing of a computer-aided contemporary Uyghur language tagging system. The coverage of this corpus, the resources building, the rules for syncopating and tagging etyma and termination, and the tagging of a corpus using a small tagset are explained. Some practical methods solving problems in Uyghur language tagging are also proposed. Key word: history and developmnet of Uyghur language, Uyghur tagged corpus, Uyghur language tagging system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

The Research and Development of Computer Aided Contemporary Uighur Language Tagging System

The research on the contemporary Uyghur information processing in the 20th century can be dated back to the beginning of 1980‘s. There are a lot of achievements up to now ,for example, fundamental theory and basic utility are established, various corpus are built, code standard is designed, Uyghur grammatical attribution research is carried out。Because the Uyghur language has its own nature,the...

متن کامل

Uyghur-Chinese Translation Disambiguation Method Research Based on Knowledge Automatic-Acquisition

This thesis studies the disambiguation method in Uyghur-Chinese translation, and proposes the design philosophy of automatic-acquisition in translation label library aiming at the deficiency of disambiguation corpus in Uyghur. It refers to the existing Uyghur-Chinese bilingual dictionary, Chinese corpus and the Internet, and acquires the corresponding Chinese translation label examples to Uyghu...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

پیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی

Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

The development of Tagged Uyghur Corpus

نویسندگان

چکیده

منابع مشابه

Corpus based coreference resolution for Farsi text

The Research and Development of Computer Aided Contemporary Uighur Language Tagging System

Uyghur-Chinese Translation Disambiguation Method Research Based on Knowledge Automatic-Acquisition

PAYMA: A Tagged Corpus of Persian Named Entities

پیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی

عنوان ژورنال:

اشتراک گذاری